{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Numpy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Python lists:\n", "\n", "* are very flexible\n", "* don't require uniform numerical types\n", "* are very easy to modify (inserting or appending objects).\n", "\n", "However, flexibility often comes at the cost of performance, and lists are not the ideal object for numerical calculations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This is where **Numpy** comes in. Numpy is a Python module that defines a powerful n-dimensional array object that uses C and Fortran code behind the scenes to provide high performance." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The downside of Numpy arrays is that they have a more rigid structure, and require a single numerical type (e.g. floating point values), but for a lot of scientific work, this is exactly what is needed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The Numpy module is imported with:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Although in the rest of this course, and in many packages, the following convention is used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is because Numpy is so often used that it is shorter to type ``np`` than ``numpy``." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Numpy arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The easiest way to create an array is from a Python list, using the ``array`` function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([10, 20, 30, 40])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Numpy arrays have several attributes that contain useful information about them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.ndim # number of dimensions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a.shape # shape of the array" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a.dtype # numerical type" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "*Note: Numpy arrays actually support more than just one integer type and one floating point type - they support signed and unsigned 8-, 16-, 32-, and 64-bit integers, and 16-, 32-, and 64-bit floating point values.*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "There are several other ways to create arrays. For example, there is an ``arange`` function that can be used similarly to the built-in Python ``range`` function, with the exception that it can take floating-point input:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.arange(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "np.arange(3, 12, 2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "a = np.arange(1.2, 4.4, 0.1)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Another useful function is ``linspace``, which can be used to create certain number of linearly spaced values within ranges (including the start and endpoint of the specified range):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linspace(10, 11, 11)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "and a similar function can be used to create logarithmically spaced values between and including limits:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.logspace(1., 4., 7)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Finally, the ``zeros`` and ``ones`` functions can be used to create arrays intially set to ``0`` and ``1`` respectively:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.zeros(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.ones(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Numpy arrays can be combined numerically using the standard ``+-*/**`` operators. This is properly one of the most powerful and extremely useful features of Numpy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([1,2,3])\n", "y = np.array([4,5,6])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x + 2 * y" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x ** y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Note that this greatly differs from lists:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = [1,2,3]\n", "y = [4,5,6]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x + 2 * y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an array which contains 11 values logarithmically spaced between $10^{-20}$ and $10^{-10}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your solution here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an array which contains the value 2 repeated 10 times (hint: there are two quick ways to do that - one is by multiplying a list, one by multiplying an array)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your solution here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try using ``np.empty(10)`` and compare the results to ``np.zeros(10)``. Also compare to what the people next to you get for np.empty. What do you think is going on? You may also want to test different array sizes than 10." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your solution here\n", "print(np.empty(10))\n", "print(np.zeros(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an array containing 5 times the value 0 as 32-bit floating point number (hint: have a look at the docs for np.zeros; the type you'd be looking for is called float32)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your solution here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accessing and Slicing Arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Similarly to lists, items in arrays can be accessed individually:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([9,8,7])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x[1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "and arrays can also be **sliced** by specifiying the first and last element of the slice and also the step size (where the last element is exclusive):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = np.arange(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "y[:5]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "optionally specifying a step:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "y[0:10:2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As for lists, the start, end, and step are all optional, and default to ``0``, ``len(array)``, and ``last`` respectively:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y[::]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also reverse the order by using negative steps:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y[::-2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given an array ``x`` with 10 elements, find the array ``dx`` containing 9 values where ``dx[i] = x[i+1] - x[i]``. Do this without loops and think of how you could obtain ``x[i+1]`` and what the final length of ``dx`` is!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your solution here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multi-dimensional arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Numpy can be used for multi-dimensional arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([[1.,2.],[3.,4.]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x.ndim" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "x.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "y = np.ones([3,2,4]) # ones takes the shape of the array, not the values\n", "y" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "y.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using ``np.linalg.inv()`` we can calculate the **inverse** of a square array, interpreted as a matrix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([[1,2],[3,4]])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.linalg.inv(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Example: The solution of the system of equations\n", "\n", " 1*x + 2*y = 5\n", " 3*x + 4*y = 2\n", " \n", "can be found using ``np.dot()``, which yields the matrix product of 2-D arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.dot(np.linalg.inv(a), [5,2])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Multi-dimensional arrays can be sliced differently along different dimensions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = np.array([0,1,2,3,4,5])\n", "print(d[::3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = np.ones([6,6,6]) # a 6x6x6 3D matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "zz=z[::3, 1:4, :]\n", "\n", "print (zz)\n", "print (zz.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In addition to an array class, Numpy contains a number of **vectorized** functions, which means functions that can act on all the elements of an array, typically much faster than what could be achieved by looping over the array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "theta = np.linspace(0., 2.*np.pi, 10)\n", "theta" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "np.sin(theta)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Another useful package is the ``np.random`` sub-package, which can be used to genenerate random numbers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# uniform distribution between 0 and 1\n", "np.random.random(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# 10 values from a gaussian distribution with mean 3 and sigma 1\n", "np.random.normal(3., 1., 10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Another very useful function in Numpy is ``np.loadtxt()`` which makes it easy to read in data from column-based data. For example, consider the file data/autofahrt_2018.txt in the data directory." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can either read it using a single multi-dimensional array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = np.loadtxt('data/autofahrt_2018.txt')\n", "data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Or we can read the individual columns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time, ax, ay, az = np.loadtxt('data/autofahrt_2018.txt', unpack=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time[:10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are additional options to **skip header rows**, **ignore comments**, **define delimiters**, or **read only certain columns**. See the [numpy.loadtxt documentation](http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) for more details." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Masking" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The index notation ``[...]`` is not limited to single element indexing, or multiple element slicing, but one can also pass a discrete list/array of indices:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])\n", "x[[1,2,4,3,3,2]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which is returning a new array composed of elements 1, 2, 4, etc from the original array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Alternatively, one can also pass a boolean array of ``True/False`` values, called a **mask**, indicating which items to keep:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x[np.array([True, False, False, True, True, True, False, False, True, True, True, False, False, True])]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now this doesn't look very useful because it is very verbose, but now consider that carrying out a comparison with the array will return such a boolean array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x > 3.4" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "It is therefore possible to extract subsets from an array using the following simple notation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])\n", "x[x > 3.4]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Conditions can be combined as usual:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x[(x<3.4) | (x>5.5)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we have to use the *bitwise* comparison here and not the *Boolean* comparison (``|`` vs. ``or`` and ``&`` vs. ``and``)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Of course, the boolean **mask** can be derived from a different array to ``x`` as long as it has the same size:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.linspace(-1., 1., 14)\n", "y = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "y[(x > -0.5) & (x < 0.4)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Since the mask itself is an array, it can be stored in a variable and used as a mask for different arrays:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "keep = (x > -0.5) & (x < 0.4)\n", "x_new = x[keep]\n", "y_new = y[keep]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_new" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_new" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A mask can also appear on the left hand side of an assignment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = np.array([1,6,4,7,9,3,1,5,6,7,3,4,4,3])\n", "y[y > 5] = 0." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file *data/munich_temperatures_average_with_bad_data.txt* provides the temperature in Munich every day for several years." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read the file using ``np.loadtxt``. Note however, the file contains \"bad\" values, which you can identify by looking at the minimum and maximum temperatures. Use masking to get rid of the bad temperature values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# your solution here\n" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }